10 research outputs found
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity
While deep learning (DL) models are state-of-the-art in text and image
domains, they have not yet consistently outperformed Gradient Boosted Decision
Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent
performance gains attained by DL models in text and image tasks have used
unsupervised pretraining, which exploits orders of magnitude more unlabeled
data than labeled data. To the best of our knowledge, unsupervised pretraining
has not been applied to the LTR problem, which often produces vast amounts of
unlabeled data.
In this work, we study whether unsupervised pretraining of deep models can
improve LTR performance over GBDTs and other non-pretrained models. By
incorporating simple design choices--including SimCLR-Rank, an LTR-specific
pretraining loss--we produce pretrained deep learning models that consistently
(across datasets) outperform GBDTs (and other non-pretrained rankers) in the
case where there is more unlabeled data than labeled data. This performance
improvement occurs not only on average but also on outlier queries. We base our
empirical conclusions off of experiments on (1) public benchmark tabular LTR
datasets, and (2) a large industry-scale proprietary ranking dataset. Code is
provided at https://anonymous.4open.science/r/ltr-pretrain-0DAD/README.md.Comment: ICML-MFPL 2023 Workshop Ora
Reputation Systems and Incentives Schemes for Quality Control in Crowdsourcing
Crowdsourcing combines the abilities of computers and humans to solve tasks that computers find difficult. In crowdsourcing, computers process and aggregate input that is solicited from human workers; thus, the quality of workers' input is crucial to the success of crowdsourced solutions. Performing quality control at scale is a difficult problem: workers can make mistakes, and computers alone, without human input, cannot be used to verify the solutions. We develop reputation systems and incentive schemes for quality control in the context of different crowdsourcing applications. To have a concrete source of crowdsourced data, we built CrowdGrader, a web based peer grading tool that lets students submit and grade solutions for homework assignments. In CrowdGrader, each submission receives several student-assigned grades which are aggregated into the final grade using a novel algorithm based on a reputation system. We first overview our work and the results on peer grading obtained via Crowdgrader. Then, motivated by our experience, we propose hierarchical incentive schemes that are truthful and cheap. The incentives are truthful as the optimal worker behavior consists in providing accurate evaluations. The incentives are cheap as they leverage hierarchy so that they be effected with a small amount of supervised evaluations, and the strength of the incentive does not weaken with increasing hierarchy depth. We show that the proposed hierarchical schemes are robust: they provide incentives in heterogeneous environments where workers can have limited proficiencies, as long as there are enough proficient workers in the crowd. Interestingly, we also show that for these schemes to work, the only requisite is that workers know their place in the hierarchy in advance. As part of our study of user work in crowdsourcing and collaborative environments, we also study the problem of authorship attribution in revisioned content such as Wikipedia, where virtually anyone can edit an article. Information about the origin of a contribution is important for building a reputation system as it can be used for assigning reputation to editors according the quality of their contribution. Since anyone can edit an article, to attribute a new revision, a robust method has to analyze all previous revisions of the article. We describe a novel authorship attribution algorithm that can scale to very large repositories of revisioned content, as we show via experimental data over the English Wikipedia
Recommended from our members
CrowdGrader: A Tool For Crowdsourcing the Evaluation of Homework Assignments
CrowdGrader is a system that lets students submit and collaboratively review and grade homework. We describe the techniques and ideas used in CrowdGrader, and report on the experience of using CrowdGrader in disciplines ranging from Computer Science to Economics, Writing, and Technology.In CrowdGrader, students receive an overall crowd-grade that reflects both the quality of their homework, and the quality of their work as reviewers. This creates an incentive for students to provide accurate grades and helpful reviews of other students' work. Instructors can use the crowd-grades as final grades, or fine-tune the grades according to their wishes. Our results on seven classes show that students actively participate in the grading and write reviews that are generally helpful to the submissions' authors.The results also show that grades computed by CrowdGrader are sufficiently precise to be used as the homework component of class grades. Students report that the main benefits in using CrowdGrader are the quality of the reviews they receive, and the ability to learn from reviewing their peers' work. Instructors can leverage peer learning in their classes, and easily handle homework evaluation in large classes.
Recommended from our members
CrowdGrader: A Tool For Crowdsourcing the Evaluation of Homework Assignments
CrowdGrader is a system that lets students submit and collaboratively review and grade homework. We describe the techniques and ideas used in CrowdGrader, and report on the experience of using CrowdGrader in disciplines ranging from Computer Science to Economics, Writing, and Technology.In CrowdGrader, students receive an overall crowd-grade that reflects both the quality of their homework, and the quality of their work as reviewers. This creates an incentive for students to provide accurate grades and helpful reviews of other students' work. Instructors can use the crowd-grades as final grades, or fine-tune the grades according to their wishes. Our results on seven classes show that students actively participate in the grading and write reviews that are generally helpful to the submissions' authors.The results also show that grades computed by CrowdGrader are sufficiently precise to be used as the homework component of class grades. Students report that the main benefits in using CrowdGrader are the quality of the reviews they receive, and the ability to learn from reviewing their peers' work. Instructors can leverage peer learning in their classes, and easily handle homework evaluation in large classes.
Incentives for Truthful Peer Grading
Peer grading systems work well only if users have incentives to grade
truthfully. An example of non-truthful grading, that we observed in classrooms,
consists in students assigning the maximum grade to all submissions. With a
naive grading scheme, such as averaging the assigned grades, all students would
receive the maximum grade. In this paper, we develop three grading schemes that
provide incentives for truthful peer grading. In the first scheme, the
instructor grades a fraction p of the submissions, and penalizes students whose
grade deviates from the instructor grade. We provide lower bounds on p to
ensure truthfulness, and conclude that these schemes work only for moderate
class sizes, up to a few hundred students. To overcome this limitation, we
propose a hierarchical extension of this supervised scheme, and we show that it
can handle classes of any size with bounded (and little) instructor work, and
is therefore applicable to Massive Open Online Courses (MOOCs). Finally, we
propose unsupervised incentive schemes, in which the student incentive is based
on statistical properties of the grade distribution, without any grading
required by the instructor. We show that the proposed unsupervised schemes
provide incentives to truthful grading, at the price of being possibly unfair
to individual students.Comment: 26 page
Genetic Structures and Conditions of their Expression, which Allow Receiving Native Recombinant Proteins with High Output
We investigated the possibility of obtaining native recombinant amyloidogenic proteins by creating genetic constructs encoding fusion proteins of target proteins with Super Folder Green Fluorescent Protein (sfGFP). In this study, we show that the structures, containing the sfGFP gene, provide a synthesis, within a bacterial system, of fusion proteins with minimal formation of inclusion bodies. Constructs containing genes of the target proteins in the 3'-terminal region of the sfGFP gene followed by a polynucleotide sequence, which allows for affinity purification fusion proteins, are optimal. Heating bacterial cultures before the induction of the expression of recombinant genes in 42°С for 30 min (heat shock) was found to increase the output of the desired products, thus practically avoiding the formation of insoluble aggregate